Fast kernel matrix-vector multiplication with application to Gaussian process learning

نویسنده

  • Alexander Gray
چکیده

A number of core computational problems in machine learning, both old and new, can be cast as a matrixvector multiplication between a kernel matrix or class-probability matrix and a vector of weights. This arises prominently, for example, in the kernel estimation methods of nonparametric statistics, many common probabilistic graphical models, and the more recent kernel machines. After highlighting the existence of this computational problem in several well-known machine learning methods, we focus on a solution for one specific example for clarity, Gaussian process (GP) prediction one whose applicability has been particularly hindered by this computational barrier. We demonstrate the application of a recent JV-body approach developed specifically for statistical problems, employing adaptive computational geometry and finite-difference approximation. This core algorithm reduces the O(N) matrix-vector multiplications within GP learning to O(N), making the resulting overall learning algorithm O(N). GP learning for N = 1 million points is demonstrated. 1 Kernel Matrix-Vector Multiplications in Learning A kernel matrix $ contains the kernel interaction of each point x^ in a query (test) dataset 2LQ (having size NQ) with each point from a reference (training) dataset X_n (having size iV^), where the kernel function K() often has some scale parameter a (the 'bandwidth'). Often the 'kernel function' is actually a probability density function, such as the Gaussian. In such cases $ is typically a class probability matrix. The query and reference set can be the same set. Often the core computational cost of a statistical method boils down to a multiplcation of this matrix $ with some vector of weights w. For example, in the weighted form of kernel density estimation, the density estimate at the q test point x_q is

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Large-Scale Multiclass Transduction

We present a method for performing transductive inference on very large datasets. Our algorithm is based on multiclass Gaussian processes and is effective whenever the multiplication of the kernel matrix or its inverse with a vector can be computed sufficiently fast. This holds, for instance, for certain graph and string kernels. Transduction is achieved by variational inference over the unlabe...

متن کامل

Scalable Log Determinants for Gaussian Process Kernel Learning

For applications as varied as Bayesian neural networks, determinantal point processes, elliptical graphical models, and kernel learning for Gaussian processes (GPs), one must compute a log determinant of an n× n positive definite matrix, and its derivatives – leading to prohibitive O(n) computations. We propose novel O(n) approaches to estimating these quantities from only fast matrix vector mu...

متن کامل

Improved fast Gauss transform User manual

In most kernel based machine learning algorithms and non-parametric statistics the key computational task is to compute a linear combination of local kernel functions centered on the training data, i.e., f(x) = ∑N i=1 qik(x, xi), which is the discrete Gauss transform for the Gaussian kernel. f is the regression/classification function in case of regularized least squares, Gaussian process regre...

متن کامل

A Parallel Tree Code for Computing Matrix-Vector Products with the Matérn Kernel

The Matérn kernel is one of the most widely used covariance kernels in Gaussian process modeling; however, large-scale computations have long been limited by the expensive dense covariance matrix calculations. As a sequel of our recent paper [Chen et al. 2012] that designed a tree code algorithm for efficiently performing the matrix-vector multiplications with the Matérn kernel, this paper docu...

متن کامل

Fast matrix-vector product based FGMRES for kernel machines

Algorithms based on kernel methods play a central role in statistical machine learning. At their core are a number of linear algebra operations on matrices of kernel functions which take as arguments the training and testing data. A kernel function Φ(xi, xj) generalizes the notion of the similarity between a test and training point. Given a set of data points, X = {x1, x2, . . . , xN}, xi ∈ R, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015